Real-time Intelligent Image Manipulation System

ABSTRACT

A system for manipulating images according to styles chosen by a user includes a feed-forward image manipulation model for everyday use and an optimization image manipulation model for more professional use. The optimization image manipulation model optimizes directly over output image pixels to minimize both the content loss and style loss. Users can select their own content and style images, and can choose between using the feed-forward image manipulation model and optimization image manipulation model.

FIELD OF THE INVENTION

The present invention generally relates to image processing. Morespecifically, it relates to a system for real-time intelligentmanipulation of images into corresponding output images givenuser-specified guiding styles.

BACKGROUND

Image manipulation, which aims to manipulate an input image based onpersonalized guiding style image (e.g. art paintings), has recentlyattracted ever-growing research interest and derived various real-worldapplications, such as attribute-driven image editing and artistic styletransfer.

Image manipulation systems have been deployed on a variety of devicesranging from mobile phones to dedicated servers. Some existing imagemanipulation systems require the use of preset styles, or distinctmodels for every user input image, resulting in limited or inefficientapplication. Even in systems which do not require the use of distinctmodels for every user input, the inference process for user inputs canbe inefficient, particularly when the system is running on a lesspowerful device, such as a basic mobile phone. Moreover, the function ofsome existing image manipulation systems is not suitable for both casualeveryday users (e.g., users modifying images on a mobile phone forentertainment) and professional users (e.g., graphic designers modifyinghigh resolution images).

In view of the above, a need exists to provide an intelligent imagemanipulation system that meets the above-mentioned needs.

SUMMARY OF THE INVENTION

The presently disclosed embodiments are directed to solving issuesrelating to one or more of the problems presented in the prior art, aswell as providing additional features that will become readily apparentby reference to the following detailed description when taken inconjunction with the accompanying drawings.

In one embodiment, a feed-forward image manipulation model (i.e., “quickprocessing model”) is utilized to transform images quickly for everydayuse. In another embodiment, an optimization image manipulation model(i.e., “professional processing model”) is utilized for moreprofessional use, which optimizes directly over the output image pixelsto minimize both the content loss and style loss. In another embodiment,the user can choose between using the quick processing model andprofessional processing model via an interface.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure, in accordance with one or more variousembodiments, is described in detail with reference to the followingfigures. The drawings are provided for purposes of illustration only andmerely depict exemplary embodiments of the disclosure. These drawingsare provided to facilitate the reader's understanding of the disclosureand should not be considered limiting of the breadth, scope, orapplicability of the disclosure. It should be noted that for clarity andease of illustration these drawings are not necessarily made to scale.

FIG. 1 is a flow chart diagram showing an exemplary end-to-end processof responding to a user requesting image manipulation, and a front-endresponse to the user, according to embodiments of the invention.

FIG. 2 illustrates a flow chart diagram of an exemplary feed-forward“quick processing” image manipulation model that performs a feed-forwardpass starting from the content image and style image, and outputs thestylized output image according to embodiments of the invention.

FIG. 3 illustrates a flow chart diagram of an exemplaryoptimization-based “professional processing” image manipulation modelthat optimizes directly over the output image pixels to minimize boththe content loss and style loss according to embodiments of theinvention.

FIG. 4A illustrates front views of a graphical user interface (GUI) fora display screen or portion thereof wherein a user may select a style,including a personalized guiding style image, according to embodimentsof the invention.

FIG. 4B illustrates front views of a GUI for a display screen or portionthereof wherein a user may upload one or more content images and theimages are arranged to form a grid, and wherein one or more of theimages can be selected for transformation, according to embodiments ofthe invention.

FIG. 5 illustrates front views of a GUI for a display screen or portionthereof wherein a content image is shown gradually transforming into astylized output image.

FIG. 6. illustrates a flow chart showing an exemplary method ofgenerating stylized output images based on a user's desired content andstyle, according to embodiments of the invention.

FIG. 7 illustrates a block diagram of an exemplary computer in whichembodiments of the invention can be implemented.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

The following description is presented to enable a person of ordinaryskill in the art to make and use the invention. Descriptions of specificdevices, techniques, and applications are provided only as examples.Various modifications to the examples described herein will be readilyapparent to those of ordinary skill in the art, and the generalprinciples defined herein may be applied to other examples andapplications without departing from the spirit and scope of theinvention. Thus, embodiments of the present invention are not intendedto be limited to the examples described herein and shown, but are to beaccorded the scope consistent with the claims.

The word “exemplary” is used herein to mean “serving as an example orillustration.” Any aspect or design described herein as “exemplary”should not necessarily be construed as preferred or advantageous overother aspects or designs.

Reference will now be made in detail to aspects of the subjecttechnology, examples of which are illustrated in the accompanyingdrawings, wherein like reference numerals refer to like elementsthroughout.

It should be understood that the specific order or hierarchy of steps inthe processes disclosed herein is an example of exemplary approaches.Based upon design preferences, it is understood that the specific orderor hierarchy of steps in the processes may be rearranged while remainingwithin the scope of the present disclosure. The accompanying methodclaims present elements of the various steps in a sample order, and arenot meant to be limited to the specific order or hierarchy presentedexcept as expressly indicated by the claim language.

Embodiments disclosed herein are directed to systems for real-timeintelligent manipulation of images into corresponding output imagesgiven user-specified guiding styles. In one embodiment, a feed-forwardimage manipulation model (i.e., “quick processing model”) is utilized totransform images quickly for everyday use. In another embodiment, anoptimization image manipulation model (i.e., “professional processingmodel”) is utilized for more professional use, which optimizes directlyover the output image pixels to minimize both the content loss and styleloss. Embodiments are disclosed in which a user can choose between usingthe quick processing model and professional processing model via aninterface. Embodiments are also disclosed in which a graphical userinterface (GUI) for a display screen or portion thereof allows a user tochoose a desired style, upload one or more content images, and chooseone or more content images to manipulate.

FIG. 1 is a flow chart diagram 100 showing an exemplary end-to-endprocess of responding to a user request, including a front-end interface110 to receive the request, triggering of appropriate back-end models(i.e., quick processing model or professional processing model), imagemanipulation with respective model on a back-end server 112 withdistributed computing support, and a front-end response to the user,wherein an output image 126 is displayed on the interface, according toembodiments of the invention. Interface 110 can comprise a display, andan input device, e.g., a touch screen input device. For ease ofexplanation, interface 110 is labeled as 110 a, 110 b, and 110 c atdifferent steps of the process 100 in FIG. 1.

At interface 110 a, a user is prompted to select a content image 122 anda style image 124. Content image 122 reflects the user's desired contentfor the stylized output image 126, for example, the desired objects,shapes, and composition of the output image. As discussed in more detailwith reference to FIG. 4B below, in some embodiments, the user mayupload multiple content images, and apply the image manipulation to eachcontent image simultaneously. Style image 124 reflects the user'sdesired style (also referred to herein as a style guide) for thestylized output image 126, for example, the desired colors, textures,and patterns of the stylized output image. As discussed in more detailwith reference to FIG. 4A below, the style guide may be selected fromamong several preset styles, or may be generated based on a style imageuploaded by the user.

Once content image 122 and style image 124 are chosen by the user, theimages are uploaded to a back-end server, such as server 112. The useris then prompted by interface 110 b to select either the quickprocessing model or the professional processing model. The back-endserver 112 then performs image manipulation based on the user's choiceof processing model, which is described in more detail with reference toFIGS. 2, 3, and 6 below, for example, using additional distributedcomputing support from multiple CPUs and/or GPUs 114. Finally, the oneor more stylized output images 126 are displayed to the user atinterface 110 c, where the user may choose to save the output images oradjust image manipulation parameters.

Significantly, the process 100 shown in FIG. 1 is “zero-shot,” meaningthat the configurations for both the quick processing model andprofessional processing model are established automatically based onguiding signals, and thus the system does not require individualtraining or models to process the user's custom content image and/orstyle image. Because of this, computation complexity and memory usagecan be greatly reduced compared to existing style transfer models whichare not zero-shot.

The system and process described with reference to process 100 in FIG. 1is advantageous as it accommodates both wide scale ordinary uses, andadvanced professional uses. For example, ordinary users generally needimmediate response on manipulating modest-resolution photos (e.g., takenwith mobile phones), while professional users (e.g., graphic designers),require quality results from high-resolution input images, but withorders-of-magnitude shorter processing time compared to manual designand drawing. To this end, the two different image manipulation models(i.e., quick processing model and professional processing model) areoffered to the user in order to achieve the differing needs.

Both the quick processing model and the professional processing modelare based on powerful deep neural networks that allow efficientend-to-end training and enjoy high generalization capacity to unseenstyles. For the first task of everyday use, the model is a feed-forwardneural network that generates results by taking a simple forward passthrough the network structure. For the second task of professional use,the other model performs fine-grained optimization on the output imageand produces a result that fuses the content and style in a detailed andcoherent manner. Each of these models are described below.

FIG. 2 relates to an embodiment of the present disclosure, wherein afeed-forward image manipulation model is utilized to transform imagesquickly for everyday use. FIG. 2 illustrates a flow chart diagram 200 ofan exemplary feed-forward processing model (i.e., “quick processingmodel”) that performs a feed-forward pass starting from a content image222 and style image 224, and outputs a stylized output image 226,according to embodiments of the invention. An example content image 222and style image 224 are shown in FIG. 2. In this example, the contentimage 222 is a photograph of a building, and the style image 226 is aportrait drawn in charcoal-style with heavy shadows. Both the contentimage 222 and style image 226 are initially represented as pixels. Itshould be understood that content image 222 and style 226 are shown hereas examples only, and that the process depicted in diagram 200 mayprocess a variety of content and style images

As shown, content image 222 and style image 226 can be received by anencoder network 232. Encoder network 232 can comprise a deepconvolutional neural network of multiple neural layers 231. Encodernetwork 232 can extract the feature vectors from both the content imageand the style image by applying a series of non-linear transformations.As a result, encoder network 232 generates a content feature vector(i.e., “content features” 252) from content image 222, and a stylefeature vector (i.e., “style features” 252) from style image 226. Inother embodiments not shown, style feature vector 254 is not generatedfrom a style image, but is retrieved from memory storage as a presetstyle. Content feature vector 252 and style feature vector 254 are eachlow-dimensional real-value vectors, which represent high-level abstractcharacteristics of their respective images.

Next, the content feature vector 252 and style feature vector 254 arereceived by a transformation module 234. Transformation module 234 maycomprise another deep convolutional neural network, similar to thatdescribed with reference to the encoder network 232. In otherconfigurations, transformation module 234 is incorporated into the samedeep convolutional network as encoder network 232. To eliminate theoriginal style encoded in the content feature vector 252 (e.g., thecolors and textures of content image 222), a whitening transformation (alinear algebra operation well known in the art) is applied on thecontent feature vector by the transformation module 234. After applyingthe whitening transformation, the covariance matrix of the resultingwhitening-transformed vector is an identity matrix. The style encoded inthe style feature vector 254 is then added to the whitening-transformedcontent feature vector using a coloring transformation (another linearalgebra operation well known in the art). The resulting vector is thestylized feature vector (i.e., “stylized features” 256), which has thesame covariance matrix as the style feature vector 254. The stylizedfeature vector 256 represents the content of content image 222 and thestyle of style image 224. In some embodiments, the whitening andcoloring transformation described above is performed multiple times(e.g., at different layers of the transformation module) in order tocapture image elements at different granularities.

Finally, the stylized feature vector 256 is decoded by a decoder module236 to create the output image 226. For example, the gradient ofstylized feature vector 256 may be calculated by decoder module 236 togenerate the pixels of output image 226. In one embodiment, the stylizedfeature vector 256 is fed into the decoder module 236 to generate thepixels of output image 226 after a series of non-linear transformations.

FIG. 3 relates to an embodiment of the present disclosure, wherein anoptimization-based manipulation model is utilized to minimize contentloss and style loss, which is suitable for professional applications.FIG. 3 illustrates a flow chart diagram 300 of an exemplaryoptimization-based processing model (i.e., “professional processingmodel”) that optimizes directly over output image pixels according toembodiments of the invention. Unlike the quick processing modeldescribed with reference to FIG. 2, which performs a feed-forward passof the network to produce the stylized image, the professionalprocessing model depicted in diagram 300 performs iterative optimizationdirectly on the pixel values of an output image. To aid in explanation,optimization steps are depicted as dotted lines in FIG. 3.

An example content image 322 and style image 324 are shown in FIG. 3. Inthis example, the content image 322 is a photograph of a building, andthe style image 324 is a floral pattern. As in the quick processingmodel described with reference to FIG. 2 above, content image 322 andstyle image 324 are received by an encoder network 332, which can besimilar to the encoder network 232 described with reference to FIG. 2.

Encoder network 332 first generates a tentative content feature vector(i.e., “tent. content features” 352) from the content image 322, and atentative style feature vector (i.e., “tent. style features” 354) fromthe style image 324. The form of these vectors and the process by whichthey are generated can be substantially similar to that described withreference to the content feature vector 252 and style feature vector254, respectively, in FIG. 2.

Next, a loss module 334 receives the tentative content feature vector352 and tentative style feature vector 354 from the encoder network 332.The loss module 334 is configured to compute a content loss with regardto the tentative content feature vector 352 and a style loss withrespect to the tentative style feature vector 354. As a result, therefined stylized pixels 360 are generated to produce the output image326.

The output image 326 is obtained by computing the gradient of thecontent loss and style loss with regard to the image pixels, andapplying the gradient to the pixels of the tentative output image 325 toget new pixel values. The resulting output image 326 then serves as atentative output image 325 in the next pass. As used herein, a tentativeoutput image is an output image which has undergone at least afirst-pass transformation, but which is not fully optimized by theprofessional processing model. For example, here the tentative outputimage 325 may include some elements of the content image 322 and styleimage 324, but these elements are not easily discerned as the image isnot fully optimized.

Next, the tentative output image 325 is received by the encoder network332, which encodes the tentative output image 325 into two vectors: arefined content feature vector (i.e., “refined content features” 356),and a refined style feature vector (i.e., “refined style features” 358).As used herein, the refined content feature vector 356 and refined stylefeature vector 358 are vectors which are generated from a tentativeoutput image, which has undergone at least the first-pass transformationdiscussed above.

Next, the loss module 334 receives the refined content feature vector356 and refined style feature vector 358 and determines various lossfactors for the models to optimize. For example, the loss module 334 maycompare the tentative content feature vector 352 (based on the contentimage 322) to the refined content feature vector 356 (based on thetentative output image 325) to determine a content loss factor.Likewise, the loss module 334 may compare the tentative style featurevector 354 (based on the style image 324) to the refined style featurevector 358 (based on the tentative output image 325) to determine astyle loss factor. The loss module 334 then optimizes the refinedcontent feature vector 356 and refined style feature vector 358, andthereby generates the refined stylized pixels 360.

If the optimization process is complete, the refined stylized pixels 360is used to construct the final output image 326. However, if theoptimization process is not complete, the refined stylized pixels 360 isused to produce a new tentative output image 325, and the optimizationprocess repeats. Repeating the optimization process on the new tentativeoutput image 325 yields a further optimized refined content featurevector 356, further optimized refined style feature vector 358, newcontent and style loss factors, and further optimized refined stylizedpixels 360, which, in turn, is decoded into yet another tentative outputimage. This process repeats until the optimization process is complete,which can be determined by a preset parameter stored in memory (e.g., apre-defined number of iterations), or by a user input (e.g., the userstopping the process at a desired level of optimization). In someexamples, if a user stops the process at a desired level ofoptimization, this desired level can be recorded and used to fine-tunethe preset optimization parameters stored in memory.

The objective of the professional processing model described in diagram300 is to minimize the Euclidean distance between the neural features ofthe content image 322 and the output image 326, and minimize theEuclidean distance between the covariance matrixes of the neuralfeatures of the style image 324 and the output image 326. The firstobjective helps the output image 326 to better retain the semanticdetails of the content image 322, while the second drives the outputimage to capture the colors and textures of the style image 324. Thus,the direct optimization on output pixels described in this embodimentgive a more intuitive way of content and style combination, and cangenerate high-quality realistic images with a natural fusion of contentand style details.

It should be understood that the quick processing model described withreference to FIG. 2 and the professional processing model described withreference to FIG. 3 can each be used alone, in combination, or togetherin a system that gives the user a choice of either model, as describedwith reference to the example process 100 in FIG. 1. Additionally,though not illustrated in FIGS. 2-3, some or all of the elements of theprocesses described may be distributed to one or more external devices,such as central processing units (CPUs) or graphic processing units(GPUs). In these examples, a distribution module in communication withthe various modules and networks can distribute data and instructions tothe external devices for efficient training and real-time responses.

FIG. 4A illustrates front views 400 of a graphical user interface (GUI)for a display screen or portion thereof showing how a user may select astyle, including a personalized guiding style image (e.g., style images224 and 324 discussed above), according to embodiments of the invention.The GUI depicted in FIGS. 4A-4B may correspond, for example, to a GUI oninterface 110 discussed with reference to FIG. 1 above. As shown inscreens 401-404, a user can choose a desired style image, representativeof the style (e.g., color, texture, and patterns) desired in the outputimage. Screens 401 and 402 of FIG. 4A illustrate example preset artisticstyles that might be presented to the user, for example, animpressionist style characteristic of Claude Monet or apost-impressionist style characteristic of Vincent van Gogh. The usercan then select one of these artistic styles, as shown in screen 403. Inthe case where the user selects a preset artistic style, the system mayload the style feature vector (or tentative style feature vector) frommemory, rather than generating the style feature vector from the styleimage itself. In addition, a preset artistic style may have one or moreassociated preset parameters (e.g., training parameters). As shown inFIG. 404, if users do not select a preset artistic style, they mayupload their own guiding style image, from which the style featurevector can be generated.

FIG. 4B illustrates front views 410 of a GUI for a display screen orportion thereof wherein a user may upload one or more content images(e.g., content image 222 and 322 discussed above) and the images arearranged to form a grid, and wherein one or more of the images can beselected for transformation, according to embodiments of the invention.As shown in screen 411, a user can activate a first button 421 to uploada content image, and can activate a second button 422 to upload a styleimage or select a preset artistic style. A display region 423 at theright of the screen can display the uploaded content images. Screen 412depicts the uploading of a first content image, which is added to thedisplay region 423 in screen 413. Screens 414-416 depict the useruploading three additional content images. As shown, as the number ofuploaded content images increases, the display area automaticallyarranges the size and position of the content images to form a grid foreasy review by the user. Screens 417-419 depict the user selectingmultiple content images in the display area 423 for transformation. Inthis way, the user can select one or more content images from all of theuploaded content images, and apply the transformation to the selectedimages simultaneously.

FIG. 5 illustrates front views 500 of a GUI for a display screen orportion thereof wherein a content image (e.g., content image 322) isshown gradually transforming into a stylized output image (e.g., outputimage 326). In screen 501, a user can direct the system to begin thetransformation. With each iteration shown in screens 502-504, the imageis gradually transformed into the output image. In some cases, the imageshown in each screen may correspond to the tentative output image 325explained with respect to FIG. 3 above, that is, each image shown inscreens 502-504 can be a progressively optimized output image under theprofessional processing model. In some examples, the user can pause thetransformation at a desired optimization level, and save or download theimage at that optimization level. The concepts illustrated in FIG. 5 canbe combined with those shown in FIG. 4 such that multiple content imagesare selected, and the gradual transformation of each of the contentimages into a respective output image is shown simultaneously. If usersare satisfied with the results, they may download the imagesindividually or in groups. After viewing the results, users can chooseto explore new pre-trained style, or upload new personalized guidingstyle image, or exit the interface.

FIG. 6 illustrates a flow chart 600 showing an exemplary method ofgenerating stylized output images based on a user's desired content andstyle, according to embodiments of the invention. This method maycorrespond, for example, to the processes described with reference toFIGS. 1, 2, and 3 above.

At step 601, a content image representative of the user's desiredcontent is selected by the user. At step 602, the user chooses either aquick processing model or a professional processing model.

The next step is to generate a stylized feature vector, and the methodof doing so varies depending on whether the users chooses the quickprocessing model or the professional processing model at step 602. Ifthe user chose the quick processing model, the next step is step 610,wherein the content image is encoded into a content feature vector and astyle feature vector, and at step 611, the stylized feature vector isgenerated. For example, in step 611, the stylized feature vector may begenerated using the whitening and coloring transformations describedwith reference to FIG. 2 above.

Alternatively, if the user selected the professional processing model atstep 602, the stylized feature vector is generated as follows. At step620, the content image is encoded into a tentative content featurevector and the style image is encoded into a tentative style featurevector. At step 621, a tentative style feature vector is generated. Atstep 622, the tentative stylized feature vector is decoded into atentative output image. Next, the method enters an optimizationoperation encompassing steps 630-638. At step 630, the tentative outputimage is encoded into a refined content feature vector and refined stylefeature vector. At step 631, the tentative content feature vector iscompared to the refined content feature vector, and the tentative stylefeature vector is compared to the refined style feature vector todetermine a respective content loss parameter and style loss parameter.At step 632, the refined content feature vector and refined stylefeature vector are optimized based on the content loss and style lossparameters. At step 633, a refined stylized feature vector is generatedbased on the refined content feature vector and refined style featurevector. At step 634, a determination is made as to whether the refinedstylized feature vector is sufficiently optimized. If it is notoptimized, the stylized feature vector is decoded as a tentative outputimage, and the optimization operation repeats, as shown in step 635.However, if the refined stylized feature vector is optimized at step634, then the stylized feature vector is saved as the stylized featurevector 636.

Once the stylized feature vector is generated (either by the quickprocessing model or the professional processing model), it is decodedinto the output image, as shown in step 641. Finally, in step 642, theoutput image is displayed to the user.

FIG. 7 illustrates a block diagram of a computer 10 in which embodimentsof the invention can be implemented. Computer 10 can perform any of themethods described with reference to FIGS. 1-4 above. Computer 10 caninclude one or more processors (CPU) 11, storage (memory) 12, an inputunit 13, display unit 14, and network interface (I/F) 15 configured tointerface with a network 20. These components may interface one withanother via a bus 16. Applications 17 may be stored on memory 12 and mayinclude data and instructions for performing any of the methodsdescribed in this disclosure, including those described with referenceto FIG. 6. In some embodiments, computer 10 can be configured to work inconjunction with a back-end server, such as the back-end server 112described with reference to FIG. 1.

While various embodiments of the invention have been described above, itshould be understood that they have been presented by way of exampleonly, and not by way of limitation. Likewise, the various diagrams maydepict an example architectural or other configuration for thedisclosure, which is done to aid in understanding the features andfunctionality that can be included in the disclosure. The disclosure isnot restricted to the illustrated example architectures orconfigurations, but can be implemented using a variety of alternativearchitectures and configurations. Additionally, although the disclosureis described above in terms of various exemplary embodiments andimplementations, it should. be understood that the various features andfunctionality described in one or more of the individual embodiments arenot limited in their applicability to the particular embodiment withwhich they are described. They instead can be applied alone or in somecombination, to one or more of the other embodiments of the disclosure,whether or not such embodiments are described, and whether or not suchfeatures are presented as being a part of a described embodiment. Thus,the breadth and scope of the present disclosure should not be limited byany of the above-described exemplary embodiments.

In this document, the term “module” as used herein, refers to software,firmware, hardware, and any combination of these elements for performingthe associated functions described herein. Additionally, for purpose ofdiscussion, the various modules are described as discrete modules;however, as would be apparent to one of ordinary skill in the art, twoor more modules may be combined to form a single module that performsthe associated functions according embodiments of the invention.

In this document, the terms “computer program product”,“computer-readable medium”, and the like, may be used generally to referto media such as, memory storage devices, or storage unit. These, andother forms of computer-readable media, may be involved in storing oneor more instructions for use by processor to cause the processor toperform specified operations. Such instructions, generally referred toas “computer program code” (which may be grouped in the form of computerprograms or other groupings), when executed, enable the computingsystem.

It will be appreciated that, for clarity purposes, the above descriptionhas described embodiments of the invention with reference to differentfunctional units and processors. However, it will be apparent that anysuitable distribution of functionality between different functionalunits, processors or domains may be used without detracting from theinvention. For example, functionality illustrated to be performed byseparate processors or controllers may be performed by the sameprocessor or controller. Hence, references to specific functional unitsare only to be seen as references to suitable means for providing thedescribed functionality, rather than indicative of a strict logical orphysical structure or organization.

Terms and phrases used in this document, and variations thereof, unlessotherwise expressly stated, should be construed as open ended as opposedto limiting. As examples of the foregoing: the term “including” shouldbe read as meaning “including, without limitation” or the like; the term“example” is used to provide exemplary instances of the item indiscussion, not an exhaustive or limiting list thereof; and adjectivessuch as “conventional,” “traditional,” “normal,” “standard,” “known”,and terms of similar meaning, should not be construed as limiting theitem described to a given time period, or to an item available as of agiven time. But instead these terms should be read to encompassconventional, traditional, normal, or standard technologies that may beavailable, known now, or at any time in the future. Likewise, a group ofitems linked with the conjunction “and” should not be read as requiringthat each and every one of those items be present in the grouping, butrather should be read as “and/or” unless expressly stated otherwise.Similarly, a group of items linked with the conjunction “or” should notbe read as requiring mutual exclusivity among that group, but rathershould also be read as “and/or” unless expressly stated otherwise.Furthermore, although items, elements or components of the disclosuremay be described or claimed in the singular, the plural is contemplatedto be within the scope thereof unless limitation to the singular isexplicitly stated. The presence of broadening words and phrases such as“one or more,” “at least,” “but not limited to”, or other like phrasesin some instances shall not be read to mean that the narrower case isintended or required in instances where such broadening phrases may beabsent.

Additionally, memory or other storage, as well as communicationcomponents, may be employed in embodiments of the invention. It will beappreciated that, for clarity purposes, the above description hasdescribed embodiments of the invention with reference to differentfunctional units and processors. However, it will be apparent that anysuitable distribution of functionality between different functionalunits, processing logic elements or domains may be used withoutdetracting from the invention. For example, functionality illustrated tobe performed by separate processing logic elements or controllers may beperformed by the same processing logic element or controller. Hence,references to specific functional units are only to be seen asreferences to suitable means for providing the described functionality,rather than indicative of a strict logical or physical structure ororganization.

Furthermore, although individually listed, a plurality of means,elements or method steps may be implemented by, for example, a singleunit or processing logic element. Additionally, although individualfeatures may be included in different claims, these may possibly beadvantageously combined. The inclusion in different claims does notimply that a combination of features is not feasible and/oradvantageous. Also, the inclusion of a feature in one category of claimsdoes not imply a limitation to this category, but rather the feature maybe equally applicable to other claim categories, as appropriate.

What is claimed is:
 1. A system for manipulating images according touser-chosen styles, comprising: a memory; an interface device incommunication with the memory and configured to allow a user to: uploadone or more images to the memory, select a content image from the one ormore images, select a style image from the one or more images, and setone or more process parameters stored in the memory; an encoder networkcomprising multiple neural computing layers configured to generate acontent feature vector by extracting content features from the contentimage according to the process parameters and generate a style featurevector by extracting style features from the style image according tothe process parameters; a transformation module configured to generate astylized feature vector based on the content feature vector and thestyle feature vector; and a decoder module configured to generate anoutput image based on the stylized feature vector; wherein the interfaceis further configured to display the output image to the user.
 2. Thesystem of claim 1, wherein: the process parameters include an indicationfor the system to use an optimization-based model; the decoder module isfurther configured to generate a tentative output image based on thestylized feature vector; the encoder network is further configured to:receive the tentative output image from the decoder, generate atentative content feature vector by extracting content features from thetentative output image, generate a tentative style feature vector byextracting style features from the tentative output image; thetransformation module is further configured to: generate a refinedfeature vector based on the content feature vector, tentative contentfeature vector, style feature vector, and tentative style featurevector; and the decoder module is further configured to: generate arefined output image based on the refined feature vector, and until anoptimization parameter is reached, repeatedly perform an optimizationiteration by sending the refined output image as the tentative outputimage to the encoder network and receiving the resulting refined featurevector from the transformation module.
 3. The system of claim 2, whereinthe decoder module is further configured to repeatedly send the refinedoutput image to the interface to be displayed with each optimizationiteration.
 4. The system of claim 3, wherein: the optimization parameteris configured to be set by the user; the interface is configured to,after displaying the refined output image, prompt the user for a choicewhether to advance optimization or stop optimization, thereby settingthe optimization parameter.
 5. The system of claim 2, wherein thetransformation model is further configured to: compare the contentfeature vector to the tentative content feature vector to determine acontent loss parameter; compare the style feature vector to thetentative style feature vector to determine a style loss parameter; andgenerate the refined feature vector by adjusting the tentative contentfeature vector based on the content loss parameter and adjusting thetentative style feature vector based on the style loss parameter.
 6. Thesystem of claim 1, wherein: the process parameters include an indicationfor the system to use a feed-forward model; generating the contentfeature vector by extracting content features from the content imageincludes generating an initial content image vector and performing oneor more whitening transformations on the initial content image vector;and generating the stylized feature vector by the transformation moduleincludes performing a coloring transformation on the content featurevector.
 7. The system of claim 6, wherein each of the neural computinglayers of the encoder network is configured to perform one of the one ormore whitening transformations, with each respective whiteningtransformation being performed at a different granularity level.
 8. Thesystem of claim 1, further comprising a distribution module configuredto distribute operations of at least one of the encoder network,transformation module, or decoder module to an external processor. 9.The system of claim 1, wherein: one of the process parameters set by theuser includes a content amount parameter indicative of the user'sdesired amount of content features in the output image, and a styleamount parameter indicative of the user's desired amount of stylefeatures in the output image; and the transformation module isconfigured to generate a stylized feature vector further based oncontent amount parameter and style amount parameter.
 10. The system ofclaim 1, wherein the interface is configured to allow the user to selecta plurality of content images from the one or more images, and theinterface is configured to display a resulting plurality of outputimages corresponding to the plurality of content images.
 11. A method ofimage manipulation based on a user's desired content and style,comprising: obtaining a content image representative of the user'sdesired content; obtaining a style image representative of the user'sdesired style; obtaining a choice of either a quick processing model ora professional processing model; generating a stylized feature vector;decoding the stylized feature vector into an output image having theuser's desired content and style; and displaying the output image;wherein, if the quick processing model is chosen, generating thestylized feature vector comprises: encoding the content image into acontent feature vector; encoding the style image into a style featurevector; and generating the stylized feature vector based on the contentfeature vector and the style feature vector; and wherein, if theprofessional processing model is chosen, generating the stylized featurevector comprises: determining an optimization parameter; encoding thecontent image into a tentative feature vector; encoding the style imageinto a tentative style feature vector, generating a tentative stylizedfeature vector based on the tentative content feature vector andtentative style feature vector, applying the tentative stylized featurevector to generate a tentative output image; repeating an optimizationoperation for a number of optimization iterations determined by theoptimization parameter, the optimization operation comprising: encodingcontent elements of the tentative output image into a refined contentfeature vector, encoding style elements of the tentative output imageinto a refined style feature vector, determining a content lossparameter by comparing the tentative content feature vector to therefined content feature vector, determining a style loss parameter bycomparing the tentative style feature vector to the refined stylefeature vector, optimizing the refined content feature vector andrefined style feature vector based on the content loss parameter andstyle loss parameter, generating a refined stylized feature vector basedon the refined content feature vector and refined style feature vector,if the refined stylized feature vector is not sufficiently optimizedaccording to the optimization parameter, decoding the refined stylizedfeature vector into a refined output image, saving the refined outputimage as the tentative output image, and repeating the optimizationoperation, and if the refined stylized feature vector is sufficientlyoptimized according to the optimization parameter, saving the refinedstylized feature vector as the stylized feature vector.
 12. The methodof claim 11 wherein, if the professional processing model is chosen, theoptimization parameter is based on a desired level of optimizationchosen by the user.
 13. The method of claim 11 wherein, if theprofessional processing model is chosen, the optimization operationfurther comprises displaying the refined output image with eachoptimization iteration.
 14. The method of claim 11 wherein, if the quickprocessing model is chosen, the content feature vector has acorresponding covariance matrix which is an identity matrix, and thestylized feature vector has the same covariance matrix.
 15. The methodof claim 11 wherein, f the quick processing model is chosen, encodingthe content image into the content feature vector includes performing afeed-forward pass to remove style elements present in the content image.16. The method of claim 15, wherein the feed-forward pass to removestyle elements present in the content image includes performing awhitening transformation on the content feature vector, and generatingthe stylized feature vector based on the content feature vector and thestyle feature vector includes performing a coloring transformation onthe content feature vector based on the style feature vector.
 17. Themethod of claim 16, wherein the whitening transformation and coloringtransformation are each performed a plurality of times on the contentimage at different levels of granularity.
 18. The method of claim 11,further comprising: obtaining a number of content images; generating aplurality of stylized feature vectors including a respective stylizedfeature vector for each of the number of content images; applying theplurality of respective stylized feature vectors to generate respectiveoutput images; and displaying the output images, each output imagehaving a respective display position and display size chosen based onthe number of content images such that the output images are arranged ina grid.
 19. The method of claim 11; wherein obtaining the style imageincludes the user uploading the style image.
 20. A non-transitorycomputer-readable medium including instructions, which when executed byone or more processors, cause the processors to perform a method,comprising: obtaining a content image representative of a user's desiredcontent; obtaining a style image representative of the user's desiredstyle; obtaining a choice of either a quick processing model or aprofessional processing model; generating a stylized feature vector;applying the stylized feature vector to generate an output image havingthe user's desired content and style; and displaying the output image;wherein, if the quick processing model is chosen, generating thestylized feature vector comprises: encoding the content image into acontent feature vector; encoding the style image into a style featurevector; and generating the stylized feature vector based on the contentfeature vector and the style feature vector; and wherein, if theprofessional processing model is chosen, generating the stylized featurevector comprises: determining an optimization parameter; encoding thecontent image into a tentative feature vector; encoding the style imageinto a tentative style feature vector, generating a tentative stylizedfeature vector based on the tentative content feature vector andtentative style feature vector, applying the tentative stylized featurevector to generate a tentative output image; repeating an optimizationoperation for a number of optimization iterations determined by theoptimization parameter, the optimization operation comprising: encodingcontent elements of the tentative output image into a refined contentfeature vector, encoding style elements of the tentative output imageinto a refined style feature vector, determining a content lossparameter by comparing the tentative content feature vector to therefined content feature vector, determining a style loss parameter bycomparing the tentative style feature vector to the refined stylefeature vector, optimizing the refined content feature vector andrefined style feature vector based on the content loss parameter andstyle loss parameter, generating a refined stylized feature vector basedon the refined content feature vector and refined style feature vector,if the refined stylized feature vector is not sufficiently optimizedaccording to the optimization parameter, applying the refined stylizedfeature vector to generate a refined output image, saving the refinedoutput image as the tentative output image, and repeating theoptimization operation, and if the refined stylized feature vector issufficiently optimized according to the optimization parameter, savingthe refined stylized feature vector as the stylized feature vector.